Introduction

The wine market is a dynamic, diversified sector of the economy where a number of factors affect how prices are formed. Historically, wine quality—which is frequently expressed in ratings—has been seen by both producers and customers as the main factor influencing pricing. But according to studies on consumer behavior and current market trends, wine pricing is significantly more intricate.

Motivation

Vivino.com is a popular website for wine lovers to buy quality wine. Also, Vivino.com’s information on wine varietals, pricing, and ratings offers a rare chance to investigate the complex relationship between wine attributes and consumer views. Then we found a dataset scraped from Vivino.com. Motivated by a common love of wine among our enthusiast community, we want to explore the connection between wine costs and characteristics of grape varietals, origin areas, and overall scores. In order to shed light on the variables that affect how valuable and high-quality wines are viewed, this study looks for possible trends, preferences, and subtleties in the wine industry. With this study, we hope to: - Examine the relationship between wine prices and the kind of grapes included in the dataset.

  • Investigate the ways in which the location of wines affects their cost and rating.

  • To investigate if higher-rated wines often fetch higher prices and the overall effect of wine ratings on their market worth.

Initial Questions

  • Highest Rated Wines Across Categories: Which type of wine, among the five major categories you’re considering, has the highest overall ratings? Which year produces the most welcoming (higher rating) wine within five different categories?

  • Most Welcoming Wine Valleys: Which wine valley is known for producing the most welcoming wines?

  • Regions with Expensive Wines: Which country produces the most expensive wines? Within countries, which regions produce the most expensive wines?

  • Price Analysis by Category in a Specific Region: How do wine prices vary within the five categories in a particular region? What is the associated visualization of the data for the distribution of the prices varies within regions?

  • Relationship Between Ratings and Prices: What is the probable statistical analysis to explore the correlation between wine ratings and their prices? Whether the relationship follows a simple linear regression model or a more complex multiple linear regression model.

  • The nature of the relationship: Whether it is positive (higher prices are associated with higher ratings) or negative (higher prices do not necessarily mean higher ratings).

Data

We got our source data from https://www.kaggle.com/datasets/budnyak/wine-rating-and-price/code. Orginally, the source data contains five individual csv files, including Red.csv, Rose.csv, Sparkling.csv, Varieties.csv, White.csv. We filtered out Vairetis.csv because the content is not consistent with the other files. We found that the remaining files have the same variables,so we merge them into one dataset.

Clean and merge the data

categories = list.files(path = "./data", full.names = TRUE)
files_data = map(categories, read_csv)
nested_file = tibble(categories, files_data)
wine_rating = unnest(nested_file, cols = c(files_data))
wine_rating = 
  wine_rating |> 
  janitor::clean_names() |> 
  mutate(categories = str_extract(categories, "[A-Z][a-z]+")) |> 
  select(-variety)
write_csv(wine_rating, file = "wine_rating.csv")

When attempting to merge the four datasets, we encountered an initial divergence in our approach. We employed two distinct methods to bind the datasets: one involved using a left join, while the other entailed combining the data using the map function. Both methods successfully merged the four datasets into a comprehensive dataset encompassing four wine categories. However, during our utilization of the left join method, we encountered issues such as the omission of category names and the development of complex code. Consequently, we concluded that employing the map function would be a more suitable approach for consolidating all the data.

head(wine_rating, 10) |> 
  knitr::kable(digits = 3) |> 
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover"), font_size = 12) |> 
  kableExtra::scroll_box(width = "100%", height = "300px")
categories name country region winery rating number_of_ratings price year
Red Pomerol 2011 France Pomerol Château La Providence 4.2 100 95.00 2011
Red Lirac 2017 France Lirac Château Mont-Redon 4.3 100 15.50 2017
Red Erta e China Rosso di Toscana 2015 Italy Toscana Renzo Masi 3.9 100 7.45 2015
Red Bardolino 2019 Italy Bardolino Cavalchina 3.5 100 8.72 2019
Red Ried Scheibner Pinot Noir 2016 Austria Carnuntum Markowitsch 3.9 100 29.15 2016
Red Gigondas (Nobles Terrasses) 2017 France Gigondas Vieux Clocher 3.7 100 19.90 2017
Red Marion’s Vineyard Pinot Noir 2016 New Zealand Wairarapa Schubert 4.0 100 43.87 2016
Red Red Blend 2014 Chile Itata Valley Viña La Causa 3.9 100 17.52 2014
Red Chianti 2015 Italy Chianti Castello Montaùto 3.6 100 10.75 2015
Red Tradition 2014 France Minervois Domaine des Aires Hautes 3.5 100 6.90 2014

The resulting wine_rating datasets has 15346 rows and 9 columns. There are a total of 9 variables. The description is listed below:

  • categories: Four types of the wine

  • name: Name of each wine

  • country: Production country of the wine

  • region: Production place of the wine

  • winery: The production winery or vineyards

  • rating: The score each wine got from consumer

  • number_of_ratings: Quantity of rating

  • price: The selling price of each wine

  • year: Production year

Exploratory Data Analysis

We started to analyze the dataset from two aspects: wine price versus geographic region and wine rating versus geographic region. To first visualize the wine price distribution in the global geographic region, we created a world map illustrating the average wine price of each countries inside the dataset. We then investigated the top 10 most expensive countries and show it in a world map as well.

Wine Price by Country

World Map of Average Wine Price by Country

price_country=
  wine_rating |>
  group_by(country)|>
  summarise(avg_by_country = mean(price)) |>
  filter(!is.na(avg_by_country))

world_map_data <- 
  price_country |>
  mutate(text = paste(country, "<br>Avg Price: $", round(avg_by_country, 2)))

fig_country_all <- plot_ly(
  data = world_map_data,
  type = "choropleth",
  locations = ~country,
  locationmode = "country names",
  z = ~avg_by_country,
  text = ~text,
  colorscale = "Viridis"
)

fig_country_all <- 
  fig_country_all|>
  layout(
    geo = list(
      showframe = FALSE,
      projection = list(type = 'mercator')
    ),
    title = "Average Wine Price by Country"
  )

fig_country_all

This world map visualizes average wine prices by country, offering a overview of the global wine market from the dataset. The data reflects average wine prices across all the different countries in the dataset. Each country is shaded according to its average wine price, with darker shades indicating lower average prices, while lighter shades represent higher prices. Notably, the United Kingdom stands out with the lightest shades, signifying the highest average wine price on the map, while Mexico is represented with the darkest shades, indicating the lowest average price.

World Map of The 10 Most Expensive Country

price_country <- 
  price_country |>
  arrange(desc(avg_by_country))

top_10_country = 
  head(price_country, 10) |> 
  mutate(rank = 1:10) |> 
  transform(rank = 1:10)

country10_map_data <- top_10_country |>
  mutate(text = paste("Rank: ", rank, "<br>Country: ", country, "<br>Avg Price: $", round(avg_by_country, 2)))

fig_country_10map <- plot_ly(
  data = country10_map_data,
  type = "choropleth",
  locations = ~country,
  locationmode = "country names",
  z = ~avg_by_country,
  text = ~text,
  colorscale = "YlGnBu"
)

fig_country_10map <- 
  fig_country_10map |>
  layout(
    geo = list(
      showframe = FALSE,
      projection = list(type = 'mercator')
    ),
    title = "Top 10 Countries Based on Average Wine Price"
  )

fig_country_10map

This map visually presents the top 10 countries with the highest average wine prices. The countries are ranked based on their average wine prices, with darker shades indicating lower prices. Each country is labeled with its rank, name, and the corresponding average price in US dollars. Based on this map, the top three most expensive countries are the UK, France, and the US.

Wine Price by Region

price_region=
  wine_rating |>
  group_by(country,region)|>
  summarise(avg_by_region = mean(price)) |>
  filter(!is.na(avg_by_region))|>
  arrange(desc(avg_by_region))

Bar Plot of The 10 Most Expensive Region

top_10_regions <- head(price_region, 10)

fig_region10 <- plot_ly(
  data = top_10_regions,
  type = "bar",
  x = ~reorder(region, -avg_by_region),
  y = ~avg_by_region,
  color = ~country,
  text = ~paste("Country: ", country, "<br>Avg Price: $", round(avg_by_region, 2)),
  marker = list(size = 10)
)

fig_region10 <- 
  fig_region10 |>
  layout(
    title = "Top 10 Regions Based on Average Wine Price",
    xaxis = list(title = "Region"),
    yaxis = list(title = "Average Wine Price"),
    showlegend = TRUE
  )

fig_region10

This bar chart displays the top 10 regions with the highest average wine prices. Each bar on the x-axis represents a region, sorted in descending order of average price, and its length corresponds to the average wine price on the y-axis. Bars are color-coded by countries, and the legend enhances context by identifying them. France prominently leads the list of the most expensive areas, with eight of the top 10 regions located in the it.

Wine Price by Winery

price_winery=
  wine_rating |>
  group_by(country,region, winery)|>
  summarise(avg_by_winery = mean(price)) |>
  filter(!is.na(avg_by_winery))|>
  arrange(desc(avg_by_winery))

Distribution for Top 10 Winery

price_winery_dis <- wine_rating %>%
  group_by(country, region, winery) %>%
  mutate(avg_by_winery = mean(price)) %>%
  filter(!is.na(avg_by_winery)) %>%
  select(country, region, winery, price, avg_by_winery) %>%
  arrange(desc(avg_by_winery))

top_10_wineries <- price_winery_dis %>%
  group_by(avg_by_winery) %>%
  nest() %>%
  arrange(desc(avg_by_winery)) %>%
  head(10) %>%
  unnest(cols = data)

# Create a Plotly boxplot with color-coded markers by region
boxplot_plotly <- plot_ly(
  data = top_10_wineries,
  type = "box",
  x = ~winery,
  y = ~price,
  color = ~region,   # Color by region
  colors = "Set3"    # Choose a color scale (Set3 provides clear distinctions)
)

# Customize layout
boxplot_plotly <- boxplot_plotly %>%
  layout(
    title = "Price Distribution for Top 10 Wineries",
    xaxis = list(title = "Winery"),
    yaxis = list(title = "Price"),
    showlegend = TRUE   # Display legend for region colors
  )

# Display the Plotly plot
boxplot_plotly

This boxplot illustrates the price distribution of the top 10 wineries, with each box color-coded by region. Each box represents a winery, featuring a median line indicating the central price, box bounds denoting the interquartile range, and whiskers showing the price spread. However, some wineries appear as short lines, which might be due to limited data points or homogeneous prices in these wineries.

Wine Price in the United States

price_us=
  wine_rating |>
  filter(country=="United States")|>
  select(country,region, winery,price)

price_us_200=
  price_us|>
  filter(price<200)

# Create a histogram
histogram_plotly_us <- plot_ly(
  data = price_us_200,
  type = "histogram",
  x = ~price,
  nbinsx = 10,  # Adjust the number of bins as needed
  marker = list(color = "lightblue")
)

# Customize layout
histogram_plotly_us <- histogram_plotly_us |>
  layout(
    title = "Wine Price Distribution in the US (Filtered under 200)",
    xaxis = list(title = "Price"),
    yaxis = list(title = "Frequency")
  )

# Display the histogram
histogram_plotly_us

This histogram illustrates the distribution of wine prices in the United States, with a specific focus on wines priced under $200. The x-axis represents the price range, while the y-axis denotes the frequency of wines falling within each price category. This visualization provides a quick overview of the frequency distribution of wine prices in the United States under the specified threshold. Notably, the histogram reveals that the majority of wines in the United States are concentrated in the price range of 0 to 19.99 dollars.

# Assuming your summarized data is stored in a variable named wine_summary
wine_summary <- wine_rating %>%
  group_by(country, year, categories) %>%
  filter(number_of_ratings > 2000) %>%
  summarise(mean_rating = mean(rating),
            sd_rating = sd(rating),
            n = n()) %>%
  ungroup()
#remove missing values and any duplicates
wine_rating = 
  wine_rating |>
  na.omit() |>
  distinct() |> 
  mutate(name = gsub("\\d", "", winery), year = as.numeric(year),
         name = iconv(name, from = "", to = "UTF-8", sub = ""),
         winery = iconv(winery, from = "", to = "UTF-8", sub = ""),
         region = iconv(region, from = "", to = "UTF-8", sub = ""))

Relationship between Wine prices and rating for different categories.

This scatter plot shows the relationship between wine prices and ratings across different categories.

# Plotly scatter plot with loess smoothing
plot_ly(wine_rating, x = ~price, y = ~rating, type = 'scatter', mode = 'markers', color = ~categories) %>%
  layout(title = "Average wine price by category and production region",
         xaxis = list(title = "Price"),
         yaxis = list(title = "Rating"))

Based on this graph, points are colored by category, including red, rosé, sparkling, and white wines. The graph suggests that there is a wide range of prices within each wine category, and there doesn’t appear strong correlation between price and rating. Some high-rated wines are available at moderate prices, while there are also expensive wines with lower ratings. However, we can see that for the red wine, we could see the trend that the red wine with higher price have higher rating.

Within countries, highest rating regions

# #Within Countries, highest rating regions
wine_rating_summary <- wine_rating %>%
  group_by(country, region) %>%
  filter(number_of_ratings > 2000) %>%
  summarise(mean_rating = mean(rating),
            sd_rating = sd(rating),
            n = n()) %>%
  ungroup() %>%
  arrange(region, desc(mean_rating)) %>%
  top_n(20)

# Plotly bar plot
plot_ly(wine_rating_summary, x = ~reorder(region, -mean_rating), y = ~mean_rating, type = 'bar', color = ~country) %>%
  layout(title = "Average wine rating by region (Within countries)",
         xaxis = list(title = "Region"),
         yaxis = list(title = "Mean Rating"))

The graph only shows the regions from which their wines received more than 2000 ratings. This bar chart compares the average wine rating by region within various countries. Regions are ordered by their mean rating. This visualization indicates that certain regions consistently produce higher-rated wines, but it also reflects the diversity within countries. For instance, wines from regions like Napa Valley, Barolo, and Bordeaux may have higher average ratings compared to other regions. Italy by far tops the list of the countries with the highest rated wine producing regions. Out of the top 27 regions in terms of mean-rating, almost half (13) of the wine producing regions are from Italy.

Rating analysis by category in a specific region (e.g., Napa Valley):

For the wines, the most famous winery we may know is Napa Vally, Napa County and California. To have a more general view for the analysis, we could narriow down the analysis to the ratings of individual regions and the wines that they produce

# Data Preparation
rating_analysis_data <- wine_rating |>
  filter(region %in% c("Napa Valley", "Napa County", "California")) |>
  filter(categories != "Rose") |> 
  group_by(region, categories) |>
  summarise(mean_rating = mean(rating)) |>
  spread(key = categories, value = mean_rating)

# Convert to matrix (required for heatmap)
rating_matrix <- as.matrix(rating_analysis_data[,-1])
rownames(rating_matrix) <- rating_analysis_data$region

# Plotly Heatmap
fig <- plot_ly(x = colnames(rating_matrix), 
               y = rownames(rating_matrix), 
               z = rating_matrix, 
               type = "heatmap",
               colorscale = "Viridis") %>%
  layout(title = 'Rating Analysis by Wine Category in Napa Valley, Napa County, and California',
         xaxis = list(title = 'Wine Category'),
         yaxis = list(title = 'Region'))

fig

This heatmap shows the average ratings of different wine categories specifically within Napa Valley, Napa County, and California as a whole. The color intensity reflects the mean rating.For most of us, the region that we are familiar with should be Napa County, Napa Valley. From the graph, the Red wines from Napa Valley, had the highest mean-ratings compared to the other two, just as the White category of wines from Napa County had the highest mean rating. For the white wine, the production from Napa Valley had the highest mean-ratings.

Based on previous visualization we got, we could analyze the most welcoming wine valleys.

Most Welcoming Wine Valleys

# Filter data if needed
welcoming_valleys_data <- wine_rating |>
  filter(number_of_ratings > 2000) |>
  group_by(region, categories) |>
  summarise(mean_rating = mean(rating), 
            number_of_ratings = number_of_ratings,
            mean_price = mean(price)) |>
  ungroup() |>
  filter(mean_rating > 4.3, mean_price > 0)  # Ensures that mean_price is greater than 0

# Create a Plotly Bubble Chart
fig <- plot_ly(data = welcoming_valleys_data, 
               x = ~mean_price, 
               y = ~mean_rating, 
               size = ~number_of_ratings, 
               color = ~region,
               colorscale = "Viridis",
               type = 'scatter', 
               mode = 'markers', 
               marker = list(sizemode = 'diameter', sizeref = 0.7, opacity = 0.4)) %>%
  layout(
  title = 'Most Welcoming Wine Valleys by Wine Category',
  xaxis = list(title = 'Mean Price', range = c(0, max(welcoming_valleys_data$mean_price))),
  yaxis = list(title = 'Mean Rating')
)

# Print the figure
fig

This bubble plot shows the relationship between the mean-prices and mean-ratings of wines, for only the wines that had more than 2000 ratings, and whose mean-rating was greater than 4.3 all grouped by the wine category. Regions with larger bubbles and higher placements on the Y-axis (mean rating) could be interpreted as more “welcoming” due to their combination of high ratings and a significant number of ratings, suggesting popular approval.The mean price on the X-axis gives an additional dimension, indicating if the quality comes with a higher cost. The region represented by the large, orange bubble (represents ‘Bolgheri Superiore’ region) seems to be the most “welcoming,” as it has a high mean rating and a significant number of reviews. In the plot, there is a wide range of mean prices, but a cluster of regions has mean ratings in the narrow range of approximately 4.3 to 4.6.Also,not all high-rated wines are expensive, and not all expensive wines are highly rated, indicating that price is not the sole determinant of quality as perceived by raters.

Numbers within Wine Types

all_wine_cate = wine_rating |> 
  filter(categories != "Varieties") |> 
  group_by(categories) |> 
  summarise(number = n())
all_wine_cate
## # A tibble: 4 × 2
##   categories number
##   <chr>       <int>
## 1 Red          8666
## 2 Rose          397
## 3 Sparkling    1007
## 4 White        3764

Distribution of wine type verses price

A box plot of the wine prices for each category is produced in this section. Additionally Wine prices are plotted on the y-axis, while wine genres are plotted on the x-axis. The color of each box plot is determined by the category. We have the following conclusions based on the plot we created.

price_plot= wine_rating |> 
  plot_ly(y = ~price, x= ~categories, color = ~ categories, type = "box", colors = "viridis") |> 
  layout(yaxis = list(range = c(0, 1500),dtick = 50))
 price_plot
  • Red Wines: The cost of red wines varies greatly, with several outliers suggesting that some are far more expensive than others. The priciest bottle of red wine may cost up to $3417. And over $1000, there are plenty of high-end possibilities. Considering that the highest-priced wines have a median price of $18.2, red wine can typically be afforded, even when there are premium selections.

  • Rosé Wines: The price distribution of rosé wines is more centered and has fewer outliers. This suggests that there are less expensive outliers and more consistent pricing for rosé wines.

  • Sparkling Wines: There are a few outliers in the moderate range of pricing for sparkling wines. Given that the median price of 19.45 is greater than that of red and rosé wines, it is possible that sparkling wines are typically seen as a more upscale choice.

  • White Wines: With a wide price range and a few outliers, white wines follow a pattern akin to that of red wines. However, the median price of 13.15 is less than that of red wines, suggesting that the typical price of white wines may be lower.

Distribution of wine type verses rating

Second, we make a box plot that, like the pricing plot, shows the distribution of ratings across several wine categories. The rating of the wines is shown on the y-axis in this case. Given that wine ratings usually range from 0 to 5, the y-axis range is set at 0 to 5. There are 0.2 intervals between each tick mark.

rating_plot= wine_rating |> 
  plot_ly(y = ~rating, x= ~categories, color = ~ categories, type = "box", colors = "viridis") |> 
  layout(yaxis = list(range = c(0, 5), dtick = 0.2))
 rating_plot

The overall distributions of ratings for various wines don’t differ significantly. However, we may still draw some conclusions from the plot we created.

  • Red Wines: The highest median grade is found in red wines. In comparison to other categories, red wines have a lower median rating and a somewhat broad rating distribution. A few low outliers might mean that certain red wines are not as well regarded as others.

  • Rosé Wines: The interquartile range (IQR) of rosé wines is smaller, indicating a higher degree of consistency in evaluations.

  • Sparkling Wines: The narrow IQR and lack of notable outliers in sparkling wines suggest a uniform ranking throughout the category. The high median rating may indicate that quality is usually viewed favorably.

  • White Wines: White wines are distributed in a manner similar to that of red wines. The IQR of white wines is modest. Overall, the evaluations seem to indicate that white wines are well rated, with a few exceptions at the lower end.

Statistical Analysis

Log transformed price verses rating

ggplot_rp_tf = wine_rating |> 
  ggplot(aes(x = log(price), y = rating, color = year)) +
  geom_point()
ggplotly(ggplot_rp_tf)

Already examined the linear relationship scatter plot of the wine rating verses wine price, we noticed that it’s hardly to be described as a linear relationship based on the distribution of the points. So, we performed logarithm transformation to the price variable and plotted a scatter plot again. The result is somehow more satisfied compared with previous result. The point are more clustered and formed a positive-linear liked relation. To perform more accurate linear model, we thought other variables might also get involved in this regression model. It is more likely to be a multiple expanded linear regression.

Fitting linear model to rating, price, and year(residual plot)

lm_rp = lm(rating ~ log(price) + year, data = wine_rating)

broom::tidy(lm_rp) |> 
  knitr::kable(digits = 3)
term estimate std.error statistic p.value
(Intercept) -15.657 1.294 -12.097 0
log(price) 0.268 0.002 113.488 0
year 0.009 0.001 14.516 0

With the thoughts in mind, we performed the MLR with wine rating, log transformed wine price, and corresponding year of the wine production. The result are shown in the above table. Things are getting a little bit tricky. We noticed that p.value of log transformed price is nearly zero, meaning there is strong linear relation between it and the dependent variable. But, most of the years have a p.value that is much greater than the significant level. More justification method are needed to help provide the evidence to judge the true relation between year and rating. So, we plotted a residual plot for this linear regression model.

resid_rp = wine_rating |> 
  modelr::add_residuals(lm_rp) |> 
  ggplot(aes(x = log(price), y = resid)) + geom_point()
ggplotly(resid_rp)

From the plot, it’s pretty obvious that the dataset needs more improvement for the model to fit better since all the point are not evenly spread acorss the axis. They are more condensed as the log transformed price increases. But, with such results, we cannot make absolute decision about the model we built is false one. To further test our model, we performed a simulation.

Performing Bootstrap for above model for further test

bootstrap_rp = wine_rating |> 
  modelr::bootstrap(n = 1000) |> 
  mutate(
    models = map(strap, \(df) lm(rating ~ log(price) + year, data = df)),
    results = map(models, broom::tidy)) |> 
  select(results) |> 
  unnest(results) |> 
  filter(term == "log(price)") |> 
  ggplot(aes(x = estimate)) + geom_density()
ggplotly(bootstrap_rp)

From the above bootstrap model, we obtained a perfect normal distribution of our frequencies of estimation. This helps proved that somehow our model is accurate for no extreme outliers or other strange data points occurred. Also, the distribution of density curve is surprisingly normal distributed, with no skewness and shoulders. This provide positive evidence for our linear regression model. Still, more tests are needed for the model to be exactly accurate.

Discussion

Findings and Conclusion

Ratings versus regions

Based on the analysis of the relationship between ratings and regions, we found that for each region, regardless of the price, there are wines with high ratings. However, the overall trend of ratings shows that wines with high ratings are mainly distributed in Italy and the United States. The proportion of high-rated wines is somewhat larger in Italy. Among them, the Italian winery Bolgheri Sassicaia is also shown in the bubble graph as having the highest prices and ratings. In the United States, the areas with higher ratings are consistent with our findings, mainly located in Napa Valley, Napa County, California. In summary, the relationship between ratings and regions is evident in specific areas, but due to each person’s unique criteria for judging wines, every region has wines that are highly rated.

Price versus regions

In terms of the relationship between price and region, the UK has the most expensive average wine prices, while Mexico tends to offer more budget-friendly options. At the regional level, France, Italy, and the US stand out as the most expensive regions for wine production. For customers choosing wines from the world’s priciest regions, France proves to be an excellent option, boasting the highest number of expensive wine-producing areas. On the winery level, Pétrus, based in France, takes the lead as the most expensive. Within the US, wine prices generally fall within the range of 0 to 20, making them reasonably affordable for the majority of consumers.

Statistical Model Analysis

Gathering all the results obtained from the statistical model analysis section. We cannot easily making decision that the model we applied are the best model for our dataset. The linear model regression we applied have different results from different test. For our residual plot and p-value from the groom::tidy() table. It is hard to conclude the model applied is reliable. But for our bootstrap simulation, the linear model is somehow fitting. The curve we produced is perfectly normal, meaning that the simulation is successful. The fitting model is somehow successful. In sum, the dataset we find need some improvement or other supportive dataset are needed for further examination of the model.

Limitation, Insight, Further Improvements

The dataset has a small scale after we filtered out some data under conditions. For example, while we do the analysis for the top 10 expensive countries for wine and the most welcoming wine, the sample size of qualified data is small, which may lead to the bias.

Customer Selection: Value-conscious consumers may go toward red or rosé wines, which provide more reasonably priced selections with a greater variety of ratings.

Perception of Quality: According to the ratings, people believe red and sparkling wines to be of greater quality. This perception may be influenced by the way the wines are made, how the market is regarded, or by other elements that are hidden from view in the plots. Additionally, if you’d like to sample some premium alternatives, red wines will the best options.